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Abstract 

We discuss the implementation of a Sheikholeslami-Wohlert term 
for simulations of lattice QCD with dynamical Wilson fermions as 
required by Symanzik's improvement program. We show that for 
the Hybrid Monte Carlo or Kramers equation algorithm standard 
even-odd preconditioning can be maintained. We design tests of 
the implementation using analytically and numerically computed cu- 
mulant expansions. We find that, for situations where the average 
number of Conjugate Gradient iterations exceeds 200, the overhead 
is only about 20%. 
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1 Introduction 



In Wilson's formulation of lattice QCD, when working at non-vanishing quark 
masses, chiral symmetry is violated. Its restoration requires the approach to 
the continuum limit and therefore small values of the lattice spacing a. Unfor- 
tunately, for small values of a and light quark masses one needs large lattices 
and simulations easily become prohibitively costly. If, on the other hand, one 
stays at too large values of the lattice spacing, one has to face cut-off effects 
which are linear in the lattice spacing in the case of Wilson fermions. Indeed, in 
quenched simulations, using Wilson fermions, the lattice spacing effects were 
shown to be a severe problem [0. 

A well-known remedy of the 0(a) effects is anchored in Symanzik's improve- 
ment program which leads to adding 0(a) correction terms to the lattice 
action. Such a term has been worked out for lattice QCD some time ago [|] 
and comes under the name of a Sheikholeslami-Wohlert (SW) term. It has a 
coefficient, c sw , which has to be fixed so as to cancel the 0(a) effects. This 
kind of improvement has been used already in quenched simulations (see [|[] 
for a review). Recently, using PCAC relations within the Schrodinger func- 
tional setup Uj, a non-perturbative determination of the coefficient was 
proposed and, at least in the quenched approximation, a significant effect of 
improvement could be verified. 

It is a very natural next step to go beyond the quenched approximation and 
study the effects of improvement also in dynamical fermion simulations. In 
this paper we want to make a first step in this direction by describing how the 
SW-term can be implemented for simulations of lattice QCD using molecular 
dynamics algorithms like Hybrid Monte Carlo [|| or Kramers equation [0, ||]. 
An important observation is that for the implementation conventional even- 
odd preconditioning can be maintained []. We will give analytically and 
numerically computed cumulant expansions that can serve as checks of the 
implementation. Finally we will discuss the crucial question, whether the gain 
in reducing the lattice spacing effects will merit the overhead of adding a 
SW-term to Wilson's fermion action. 

The theory is established on a Euclidean space-time lattice with size L 3 x T. 
With lattice spacing set to unity from now on, the points on the lattice have 
integer coordinates (t,xi,x2,x 3 ) which are in the range < t < T;0 < Xi < L. 
A gauge field U^(x) e SU(3) is assigned to the link pointing from point x to 

J Of course, also different preconditioning techniques like the one proposed in |l(J could 
be used. 
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point (x + fj,), where ^ = 0,1,2,3 designates the 4 forward directions in space- 
time. The gauge fields assume periodic boundary conditions. The quark fields 
^Aaa(x) are defined on each site of the lattice, where A, a and a are flavor, color 
and Dirac indices respectively. The partition function for Wilson QCD is given 
by, 

Z = J VUVyEtyexp (S g -S w -S sw ) . (1) 

The first term S g is the pure gauge action and is given by 

S fl = -§£Tr(I/p + 4) . (2) 
p 

The symbol U P represents the usual plaquette value on the lattice. The bare 
gauge coupling g is related to (3 by gl = 6//3. The second term in the action 
is the usual Wilson fermion action: 

S w = Y,^){D + m)^{x) , (3) 

X 

where m is the bare quark mass. The Wilson difference operator D which 
appears in the above expression is given by: 

D = ^E>( v ^+ v ;)- v ^ v ; . 

V^(x) = Un(x)il>(x + fi) - ij){x) , 

VJMaO = 1>(x)-Ut(x-vW(x-ri . (4) 

The third term S sw in the action is the Sheikholeslami-Wohlert (or clover) 
term. We write it as: 

S 3 w = ^c sw ^ 4>(x)a^J 7 fl „(x)^(x) , (5) 

where c sw is a parameter that has to be tuned such that this term will eliminate 
the 0(a) effects in the on-shell physical observables (Symanzik improvement). 
To tree level, this parameter should be set to unity. The Dirac matrices a^ v are 
defined in the usual way via 7-matrices. The antisymmetric and antihermitian 
tensor Tp V {x) is a function of the gauge links and given by: 
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Figure 1: The diagram that represents J-^ v 



X + JI - V 



T^(x) = ~[U^x)U v (x+fi)Ul(x + i>M(x) 

+ U v {x)Ul(x + P - p)Ul{x - p)U„(x - ji) 

+ U^(x - fi)Ul{x - v - fa)Un(x - v — p)U v {x - v) 

+ Ul{x-v)U^x-i>)U v {x-i> + jl)Ul{x) 

- h.c] . (6) 

We can represent this quantity by its corresponding diagram (see Fig. |1|), which 
suggests the name "clover". 

We will assume that the mass matrix in equation (|3p is proportional to the 
unit matrix in flavor space, namely we are considering two flavors of Wilson 
fermions with degenerate masses. The fermion fields in the partition function 
can be integrated out, resulting in the fermion determinant. This determinant 
is then "integrated in", using the pseudofermion (boson) fields 4>H X ) ar| d <l>(x) 
for the convenience of the simulation. We then write the partition function 
and effective action of the theory as: 

Z = J VUV^V<Pe~ S "" , 
Seff = Sg + ^Q- 2 ^ , (7) 
where the matrix Q is related to the fermion matrix M by Q = c j 5 M and, 
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including the SW-term, is given by: 

Q{U) xy = c 7 5 [(l + -^c sw Ka^T^(x))5 Xt y 

}} , (8) 

with k = (8 + 2m)- 1 and c = (1 + 8k) _1 . The notation a^T^ in the above 
expression, and henceforth, implies the summation over \i and v. 



2 Implementation of the SW-term 

In this section we will discuss how the SW-term is implemented for simulations 
using Hybrid Monte Carlo or Kramers equation algorithms. Although these two 
algorithms show a comparable performance, the Kramers equation algorithm 
seems to be advantageous due to possible problems of non-reversibility in the 
Hybrid Monte Carlo algorithm [[J. Since the use of preconditioning techniques 
in fermion simulations appears to be very important we will discuss directly 
how to install the SW-term with, in our case, even-odd preconditioning, see 
also [|llj] for a similar discussion. Let us write the matrix Q in equation (|8|) as 

g = C ° 75 ( M oe l + Too ) ' (9) 
where we have introduced the matrix T ee (T 00 ) on the even (odd) sites as 

{T) X aot,yb(3 = \c sw nof u Tf u (x)5 xy . (10) 

The off-diagonal parts M eo and M oe , which connect the even with odd and odd 
with even lattice sites respectively, are just the conventional Wilson hopping 
matrices. Preconditioning is now realized by writing the determinant of Q, 
apart from an irrelevant constant factor, as 

det{Q) oc det(l + T ee )detQ 

Q = c 0lb {l+T oo -M oe {l+T ee )- l M eo ) . (11) 

The constant factor c is given by c = 1/(1 + 64k 2 ). For the simulation with 
even-odd preconditioning, one introduces the Hamiltonian: 

H = Y J \Tr{Hl(x)) + S { : f ° ) f {U,, ( f ) \d>) , (12) 

X.fl 
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where H^(x) represents the momentum conjugate to the gauge field f7 M (x) and 
takes values in the Lie algebra of SU(3), i.e. a traceless hermitian 3x3 matrix. 
The effective action for the even-odd preconditioned case is given by: 

S$ = S g [U^] + S det [U^] + S^U^tf^] , 

SglUj = -(l3/6)Y,Tr(U P + Ul>) , 
p 

Sdet[U^} = -2Trlog(l + T ee ) , 
S b [U^(j>\4>] = $Q- 2 4> . (13) 

The first term S g in S^J is just the usual Wilson pure gauge plaquette action. 
The second term involves the determinant of the matrix (1 + T ee ) on the even 
lattice sites. We shall refer to it as the determinant contribution. The third 
term is the determinant of the preconditioned matrix Q which acts only on the 
boson fields at odd lattice sites. We will call 5 6 the bosonic contribution. 

The preconditioned matrix Q contains the inverse matrix (1 + Tee) -1 which is 
needed only locally at a given lattice point x. Separating the lower and upper 
components of the fermion fields, the computation of the inverse amounts to 
the inversion of two complex 6x6 matrices per lattice site. This can be done 
using, for example, a Householder triangularization [|12[] which is advantageous 
on parallel computers, especially on the Alenia Quadrics (APE) machines we 
are using here [jT3|. In order to study, whether there might occur problems 



with inverting the matrix (l + T ee ), we measured the lowest value D mm of the 
determinants det(l + T ee (x)) for every configuration. As an example, for (3 = 5, 
k = 0.150 and c sw = 1.52 we find that D min = 0.584(2). Noting that the critical 
hopping parameter n c is lowered for SW-improved fermions as compared to 
conventional Wilson fermions and, since for larger values of (3 and(or) 



lower values of k, D min is increasing, we do not expect problems with the 
matrix inversion for realistic QCD simulations. 

Since the bosonic part S b is quadratic in the <f> fields, they are generated at 
the beginning of each molecular dynamics trajectory via 

<j) = QR , (14) 

where R is a random spinor field taken from a Gaussian distribution of norm 
one. The gauge fields and its corresponding momenta are updated according 
to: 

U^x) = ex.^{iH ll 5t)U ll {x) , 
iSH^x) = [U^x)F^x)] t .aM , (15) 



(i 



where [• • -} t .a. stands for the traceless antihermitian part of the matrix QTHJ and 
the quantity [U fli (x)F fl (x)] T .A. is the "total force" associated with the link U^(x) 
which will be discussed in more detail below. In the simulation the update 
eq.(p3|) is realized via a leapfrog integration scheme. 

In molecular dynamics algorithms, is the "coefficient" in the change of 
the effective action when an infinitesimal change of the gauge link 5U^(x) is 
applied, i.e. 

SS e7f =Y, T <W 5U ^ +F l( X ) SU l( X » ■ ( 16 ) 

The quantity F^x) receives three contributions. The first one, V^x), is coming 
from the pure gauge part. The second is from the determinant of the SW-term 
on the even sites. The third one is coming from the change in ^Q" 2 ^. We 
will now discuss the last two terms in some detail. 

Let us start with the third contribution, namely the variation of the effective 
action from the bosonic term S b . A direct variation of this term gives: 

SS b = -(XlSQY + YjSQX ) , (17) 

where X Q = Q~ 2 4> and Y a = Q~ x $ are fields defined only on odd lattice sites. 
They are obtained by inverting the preconditioned matrix Q 2 with a given 
source 4> . 
Using 

SQ = c o7 5 [ST 00 - SM oe ( 1 + T ee ) - 1 M eo - M oe ( 1 + T ee ) - 1 SM eo 

+ A/ oe (i + r ee )-Mr ee (i + r ee )- 1 Af eo ] , (18) 
we find that the variation SS b can be written as 

6S b = -(c /c )(X^6QY + Y^dQX) , (19) 

where now, the fields X and Y are defined over the full lattice. For the field 
X, we have: 



X=[ I,Jeo ^° , (20) 




and similarly for the field Y. The variation of the original matrix Q again 
consists of two parts. The contribution from the off-diagonal elements is the 
same as in the conventional Wilson case. This term will be denoted as F^"\x) 
for a given link U^(x). Therefore, the only new ingredient is from the diagonal 
terms which explicitly involves the contributions of the SW-term. To obtain 
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Figure 2: The diagrams that contribute to Fjf w \x), eq . (|21|) . 
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this contribution, let us focus on the variation of one particular link variable 
U^(x), keeping all others constant, and try to collect all possible contributions. 
One easily finds out that, in order for the "clover leaves" of the SW-term to 
contain the chosen link U^x), the center of the clover can only be at lattice 
points x, x + \x, x±v and x + fi ± v, where v is some direction different from 
[i. After realizing this, it is just a matter of book-keeping to gather all the 
possible terms. We will represent them collectively in a diagrammatic fashion, 
see Fig. ^|. We write the contribution, Fjf w \x), from these diagrams as: 



p{sw 

/' 



>(z) = - 



( diagrams in Fig. 2 ) 



(21) 



The diagrammatic representations have the following simple interpretation. 
Starting from point x + ^i, one can, following the direction of the arrows on the 
lines, go around the plaquette and finally arrive at point x. In doing this, one 
performs matrix multiplications along the way. Each line segment with an arrow 
on it represents the corresponding gauge link at that point. A black square at 
a given point, with a particular choice of ^ and v, stands for an "insertion" of 
a 3 x 3 matrix in color space. This matrix is given by the following expression: 



(^)a: — Tr jyi rac (ij^a fj_^Y (x) ® (x) + ij^a^^ 



(22) 



The trace in the above formula, as indicated, is done only in Dirac space and 
X stands for a direct product in color space. For a fixed p,v pair, one can 
take two different paths to arrive at point x, namely the "upward" path, which 
passes through point (x + v), and the "downward" path, which passes through 
point (x-v). The path that winds downward gets a relative minus sign for its 
contribution. Finally, one should sum up all the contributions from all possible 
v ^ [i values. 
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Figure 3: The diagrams that contribute to F^ det) (x), eq.(§|). They are only included 
when the insertion points are even lattice points, as indicated in the label. 



Let us turn now to the determinant contribution from the even sites. We 
have, when taking the variation of the action, 

SS det = -2Tr(l + T ee )- 1 6T ee . (23) 

This contribution can also be easily represented by a set of diagrams. In 
fact, by almost the same set of diagrams as in the previous case, except for 
the following changes. First the black square insertions at various points are 
replaced by the black triangles at the same points. And each black triangle, at 
some chosen point x and direction v ^ fx, represents the following expression: 

{A) x ^Tr Dlrac [ia^{l + T ee {x))- 1 } . (24) 

The second modification is that these insertions should only be done at even 
lattice points, otherwise the contribution is set to zero. With these modifica- 
tions, the contribution from the determinant Fjf et \x) is given by 



'-( diagrams in Fig. 3 ) 



(25) 



Finally, F^(x) is written as a sum of all the contributions as: 

F,(x) = V,{x) + Fj?\x) + Fl^{x) + FjrHx) . (26) 

Let us mention, that even-odd preconditioning can also be achieved in a 
manner slightly different from the case just discussed. In this case one treats 
the even and odd points more symmetrically. This amounts to rewriting the 
determinant of the fermion matrix as: 



det{M) = det(l + T ee )det(l + T OQ )detM 



(27) 
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Then, the determinant of M^M is realized by introducing the bosonic Gaussian 
fields while the determinants of (1 + T ee ) and (1 + T ee ) could be treated the 
same way as discussed before. In this version the determinant contributions 
are then summed up on all lattice sites, even and odd. At present, it is not 
clear, whether one of the two possibilities of preconditioning the fermion matrix 
should be preferred. 

3 Tests of the implementation 

Applying an infinitesimal change to the gauge links, the subroutine which eval- 
uates F M (x) was first checked using eq . fll6p by computing the change of the 
action explicitly and by using F^(x). Besides this, the focus of our additional 
tests will concern the SW-term alone. For this purpose, it is advantageous to 
study cases in which the effects of the SW-term have been singled out. The 
SW-term that enters the simulation program only depends on the parameter 
combination c sw = c sw n. Therefore, one can take the limit k — > but keep c sw at 
some prescribed value. In this way, the effects of the Wilson hopping matrices 
are completely switched off and the fermionic part of the action depends only 
on the SW-term. Of course, the pure gauge parameter (3 is still at our disposal 
and the SW-term can be tested at various values of j3. 

3.1 Tests at p = o 

As a first test, we study the model in the limit of = and k = 0, however, 
c sw k is kept finite. In this limit, the effective action of the theory is simply: 

S e// = -2]TTrlog(l+T(:r)) , (28) 

X 

with T(x) as given in eq . (ITOf) . The advantage of taking this limit is that, in 
this simplified case, the average plaquette value can be evaluated analytically 
as a power series expansion in terms of c sw . For the expectation value of a 
plaquette, P = (l/6)tr(U P + U ] P ), to lowest order one gets 

<P>=~c 2 sw + 0(^ sw ) . (29) 

To work out the next leading correction of this expansion requires some more 
work. Basically one is led to the following general expression: 

< P > = c 2 sw < PTr{oTf > -^f- < PTr(oTf > 
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c sw expansion at (3 = 
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Figure 4: We plot the average plaquette value as measured from our Monte Carlo 
simulations for various values of c? sw at ft — on a 4 4 lattice. This is to be compared 
with the analytic result obtained from the small c sw expansion which is plotted as the 
lines. The dotted line indicates the result of the leading order (eq. (^9|)) while the solid 
line also includes the next leading order in the expansion (eq. d3l])). 
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r 4 

+ _|B. < PTr{oTfTr{oTf > 

- <4 < PTr(aT) 2 > < Tr(aT) 2 > , (30) 

with < O > = J dUO(U) and dt/ being the normalized group measure. In 
the above formula, we have used the shorthanded notation oT to stand for 
0>„.7>„/2- The last two terms correspond to the connected contribution of the 
operators P and Tr(aT) 2 . The calculation of these terms is quite straightfor- 
ward but rather technical. We list some of the details in the appendix. This 
calculation results in: 

~4,(i + §||4,) ■ (si) 

We made simulations with SW-improved action in the Schrodinger func- 
tional setup at various values of c sw at (3 = on a 4 4 lattice and the results 
are shown in Fig. f| together with the prediction of eq. (p9|) and eq. (|3l| ). For 



small enough values of c 2 w (less than O.Ol), the lowest order of the expansion 
is already satisfactory. For larger c 2 w values, the deviation of the data from 
the lowest order prediction is quite significant. However, if one also includes 
the second order contribution, all data points are in good agreement with the 
anticipated values up to c 2 sw ~ 0.1. 



3.2 Tests at nonvanishing p 

For a nonvanishing value of (3 and any observable O, one can obtain an expan- 
sion in terms of numerically computed cumulants at c sw = 0. We have 

< O > = <O> +2c 2 sw [< OTr^T^T^) > - < O > < Tr^^T^) >„] 

Q 

+ [< OTr^T^T^Tf^) > - < O > < T^T^T^T^) > ] . (32) 

In the above formula, the notation < ... > stands now for the Monte Carlo 

average at c sw = 0. We have performed a high statistics run at a fixed (3 value 

with c sw = 0, measured the cumulants above and constructed the expansion. 

On the other hand, we have made simulations at a nonvanishing value of c sw 

directly and compared with the cumulant expansion. This test was performed 

at f3 = 6 on a 4 4 lattice and the result is illustrated in Fig. [j| 

Obviously, for the two quantities that have been measured, namely the 

average plaquette value and the average value of Tr^^T^), the measured 

2 For the timelike plaquettes touching the boundary within the Schrodinger functional, 
eq.© is replaced by -(1/24)5^(1 + (491/384)5^). 
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= 6.0 




Figure 5: We plot the average plaquette value and the quantity Tr^T^T^v) as measured 
from our Monte Carlo simulations for various values of c 2 sw . All of these data points stay 
within the shaded region as predicted by the cumulant expansion, which is constructed 
from a high statistics run (5.5 million measurements) at c sw = 0. The shaded bands 
represent the error of the cumulant expansion obtained by analyzing the blocked errors of 
the various cumulants at c sw = 0. 



values from the simulation agree very well with the prediction of the cumulant 
expansion. The width of the shaded bands in the figure represents the statistical 
uncertainty of the cumulant expansion which is constructed from a run at 
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c sw = with 5.5 million measurements. In this analysis, we only took into 
account the blocked statistical errors of the run at c sw = 0. The 0(c 3 sw ) term 
in the cumulant expansion ( ^2|) is also included, the effect of which is however 
rather small. It is clear from the figure that the systematic errors in this case 
should be smaller compared with the statistical errors for the parameter range 
we are considering here. Of course, by measuring more cumulants at c sw = 0, 
one could also construct higher order corrections in the cumulant expansion. 

4 Timing of the Program 

When comparing with the conventional Wilson QCD simulation, the timing of 
the version with the improved fermion action is a crucial piece of information. In 
a typical QCD simulation using molecular dynamics algorithms, the conjugate 
gradient (CG) iterations dominate the CPU time of the program. Therefore, 
in the limit of number of conjugate gradient iterations N C g going to infinity, 
the time of the matrix-vector multiplication is the most crucial part. In our 
version of Symanzik improvement, this multiplication turns out to be only 
slightly slower than the conventional Wilson case by about 15 percent. The 
force evaluation, in this case, adds some overhead to the program which does 
not depend on N CG . It is convenient to express all times in units of the matrix- 
vector multiplication of the conventional Wilson case. 

To be specific, we propose the following formula for the time of the program: 

Twilson = 4°) + (c£> +cW)N CG , 

Tsw = c(°J + ( C «+ci 2 J)iV CG . (33) 

In the above formula, the coefficient cffi = 2 is, by definition, the number of 
matrix-vector multiplications needed for each CG iteration in the conventional 
Wilson case. The coefficient cL 1 - 1 represents the cost of other operations for 
each CG iteration (linear combinations, inner products etc.). Typically, this 
coefficient is quite small. The coefficient c&' is the overhead that does not 
depend on N C g- The value of Cw depends on the implementation of the 
program. However, in the asymptotic region where N C g is large, the effect of 
becomes irrelevant. 

The quantities ciS, ciw and ci2 are the corresponding coefficients for the 
program with the Sheikholeslami-Wohlert action. In this case, cf2 = 2.3 is 
slightly larger than the corresponding value in the Wilson case. The value of 
Csw = remains the same and we find, for our implementation of the SW-term 
on the APE computer, the coefficient cf2 ~ (50 + c£') to be larger than that of 
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Figure 6: The CPU time for the algorithm of the improved action is plotted as a function 
of the number of conjugate gradient iterations per step (Ncg) relative to the conventional 
Wilson fermion simulation. The solid and the dashed lines correspond to the choice of 
= and = 50 in eq. (33) respectively. We see that, in the regime of Ncg 
values expected for QCD simulations, the algorithm for the improved action takes about 
20 percent more time to generate a configuration. 



the Wilson case. In Fig. || we plot the CPU time of our program with improved 
fermions relative to the conventional Wilson case as a function of the number 
of CG-iterations per molecular dynamics step. The solid and dashed curves in 
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the figure correspond to the choice of cl 0) = and cL? ) = 50 respectively, which 
we consider to be two rather extreme cases. For typical simulations, the curve 
should lie between these two. In any case, we see that, for the most interesting 
physical situation in which N C g ~ 200 or more, the simulation with improved 
fermions is only about 20% slower, quite independent of the value of 



5 Conclusions 



We demonstrated in this paper that the SW-term as required by Symanzik's 
improvement program to cancel 0(a) effects in Wilson QCD can be imple- 
mented straightforwardly for simulation algorithms like Hybrid Monte Carlo 
or the Kramers equation. For the implementation even-odd preconditioning 
can be maintained which is an important aspect concerning the performance 
of the algorithms. We designed tests for checking the implementation of the 
SW-term by analytical and numerical computations of cumulant expansions. 

The most important outcome of our work is illustrated in Fig. 6 which 
compares the time of a program with conventional Wilson fermions with the one 
where the SW-term has been added. The figure nicely demonstrates that the 
overhead due to the SW-term in the lattice action is only about 20% when the 
number of conjugate gradient iterations exceeds about 200 iterations, which is 
a number easily reached in realistic simulations of lattice QCD. On the other 
hand, Symanzik improvement will enable us to perform QCD simulations with 
much smaller lattice artifacts than the conventional Wilson fermions. Since 
the cost of a QCD simulation is proportional to a large power of a~\ any small 
factor of improvement in lattice spacing will reduce the cost of the simulation 
significantly. 

This brings us to the conclusion that simulations with SW-improved dynam- 
ical Wilson fermions can be done without much more cost than conventional 
Wilson fermion simulations. We expect that the cancellation of the strong 
0(a) effects, as have been seen in quenched simulations, will by far merit the 
small overhead due to the addition of the SW-term. 

A last remark concerns the multiboson technique for simulating lattice QCD 
16j [17| . This relatively new method shows a comparable performance to the 



molecular dynamics algorithms and can be considered as a real alternative to 
the Hybrid Monte Carlo or Kramers equation algorithms. However, for the 
multiboson technique it is not straightforward to implement the SW-term, if 
one wants to use heatbath or over-relaxation algorithms. We propose therefore, 
to use exact local versions of the Hybrid Monte Carlo or Kramers equation 
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algorithms []IB[ for the simulation. For these algorithms, the implementation 
of the SW-term described here directly applies. 
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A Analytic computation of the cumulant expansion at = 



In this appendix, we discuss the analytic calculation of the cumulant expansion 
for the average plaquette value at (3 = in some detail. Our aim is to calculate 
the ensemble average of the plaquette which is defined as P = (l/6)tr(U P + 
Up). We will only consider the case of periodic boundary conditions for the 
gauge fields here. The plaquette is specified by its (geometrical) "orientation" , 
which designates the plane that the plaquette lies in, and its two possible 
"circulations" , one corresponding to U P and the other to Up. When we expand 
the effective action (|2"%| ) into power series of c 8W , at each order, we will have 
products of plaquettes of various orientations and circulations. However, in 
order to obtain a nonvanishing contribution after group integrations, all the 
plaquettes that appear in the product have to be "tiled" according to the 
following simple rule: 

For any plaquette appearing in the expansion, the number of factors of 
one circulation minus the number of factors of the opposite circulation 
has to be (mod 3). 
The average plaquette value, up to 0(c^ w ), is given by equation (^0]) in the 



main text. To obtain the contribution of 0(c^ w ), one has to evaluate the 
contributions from the following two cases: (1) the connected contributions 
of P, Tr{aTf and Tr(uTf and (2) the contributions from PTr(aT) 4 . We will 
now study these two cases separately. Since tr(U P ) and tr(U P ) always give the 
same contribution, we will just consider the former. In the calculation below, 
the following common factor always appears: 

1 c 2 c 2 

3 6 64 64 ' y ' 

which comes from the definitions of the plaquette and T^ v . 



A.l The connected contribution <PTr(oT) 2 Tr(oT) 2 > c0 

The connected contribution < PTr{aT) 2 Tr{aJ : ) 2 > c0 is defined to be 

< PTr(oT) 2 Tr(oT) 2 > c0 = < PTr(aT) 2 Tr{aT) 2 > 

- 2 < PTr(oT) 2 > < Tr{oT) 2 > . (35) 

On the right hand side of the above equation, the first term receives contri- 
butions of the type tr{Up)tr(UpU P )tr(Up<U P ,) which exactly cancels the second 
term. The remaining contributions only come in from two different cases which 
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we will call (a) and (b). In case (a), the two factors of Tr(aT) 2 are sitting 
at neighboring corners of the plaquette to be measured. The link which con- 
nects these two corners is also an edge of another plaquette P' ^ P which 
has the same geometrical orientation. Then, the combination of the type 
tr(Up)tr(UpUp>)tr(U P ujy,) fulfills the rule and gives a contribution of 

< PTr{oT) 2 Tr{oT) 2 >&>= *H • c fac . (36) 

In case (b), the situation is the same as in case (a) except that two factors 
of Tr{oT) 2 are sitting at the same corner of the plaquette to be measured. 
Now, since we have 3 different ways of choosing the other plaquette P', we get 
a contribution: 

< PTr(oTfTr(oTf >$ = 3 < PTr(oT) 2 Tr(oT) 2 . (37) 
This concludes the calculation of the contribution < PTr{oT) 2 Tr(oT) 2 > c0 . 

A. 2 The contribution from < PTr(oTf > 

In this case, there could be two possibilities concerning the Dirac structure of 
the factors from oT: 

(A) All 4 factors have the same geometrical orientation as the plaquette 
to be measured. 

(B) Two of the 4 factors have the same geometrical orientation as the 
plaquette to be measured, the other two take orthogonal orientations. 

We will now discuss these contributions separately. 

For case (A), we have the following 3 possibilities which we will list as (Al), 
(A2) and (A3). In all these cases, the Dirac trace simply gives a factor of 4. In 
case (Al), we consider the combination tr{U P U P U P >U P ,) , P ^ P' . Note that 
there are 4 such terms in Tr{aT) i . The contribution is: 

< PTr(aT) 4 >^ 1} = 384 • c fac . (38) 

In case (A2), the situation is almost the same as in case (Al) with the 
exception that the structure is of the type tr(U P U P >U P U P ,). Since there are 2 
such terms in Tr(aTY, the contribution is: 

< PTr(oTf >t 2) = -64 • c fac . (39) 



19 



In case (A3), at least one of the oT factors contributes a Up which has a 
circulation opposite to that of the plaquette to be measured. This contribution 
is given by: 

< PTr(aT) 4 > ( Q A3) = 16 • c fac ■ I , (40) 
where the group integral I is given by 

/= [ dUtr(U)tr[(U -tff]= [ dUtr{U)tr[(U^) i - AU 2 ] . (41) 



This integral, as well as the other group integrals appearing in the different 
contributions, can be evaluated using the techniques described in appendix B. 
The value of / in fl41|) equals 5. This concludes the discussion of case (A). 

For the case (B), we have the following two possibilities which we call (Bl) 
and (B2). 

In case (Bl), the situation is very similar to case (Al) above, i.e. the 
contribution has the form tr(UpUpUp>U P ,). In this case, the Dirac trace will 
just give a factor of 4. The two factors Up, and Up, can take the 5 orientations 
orthogonal to the one specified by the measured plaquette. They can also take 
all 4 possible leaves and 2 possible circulations in the clover. This contribution 
turns out to be the dominant one at this order. We have: 

< PTr{oTf > { B1) = 2560 • c fac . (42) 

Case (B2) is closely related to (A2) and the structure is tr(UpUp>U P Up,). 
However, since P' does not have the same geometrical orientation as P, the 
Dirac trace should be done with more care. Basically, for any given orientation 
of the measured plaquette, among the other 5 orthogonal orientations, only 
one of them gives a Dirac trace of 4 while the other 4 orientations give a 
Dirac trace of (-4) due to the anticommuting properties of the a matrices. 
Therefore, effectively there appear to be only 3 orientations to contribute a 
Dirac trace (-4). We get 

< PTr{aTf > { Q B2] >= 256 • c fac . (43) 

This concludes our calculation for case (B). 

Adding all contributions up, gives the result quoted in the main text. 

B Evaluation of group integrals for su(3) 

Here we discuss some techniques to evaluate the group integrals for the group 
SU(3). The integrals that we wish to evaluate are of the form 

{ dU[tr{U)] ll [tr{U- 1 )Y 1 [tr{U 2 )] l2 [tr{U^ 2 )] 32 ■•• , (44) 
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where U is an element of SU(3) and the measure dll stands for the normalized 
Haar measure of the group. The first step is to use the Cayley relation for an 
51/(3) matrix: 

U 3 = tr{U)U 2 -tr{U- v )U + l , (45) 

which could be established easily from the fact that (U- \i)(U- A 2 )(J7- A 3 ) = 0, 
where Ai, A 2 and A 3 are the eigenvalues of U and unit matrices are to be 
substituted appropriately. With the help of this relation, we can transform 
terms with powers of U higher than 1 into powers of 1, and (-1). This means 
that, without loss of generality, we can discuss the following integral: 

J dUitriU^itriU- 1 )} 31 . (46) 

This integral can be done by realizing that the integrand above is nothing 
but the character of the representation 3* 1 3 3 ' 1 , assuming U is in the 3 rep- 
resentation. Therefore, the above integral is equal to the number of trivial 
representations in this direct product. All one has to do now is to decompose 
the direct product 3 ll ®3 jl and count how many times the trivial representation 
appears. 

As an example, let us consider the integral 

J d?7[ir(C/- 1 )][<r([/ 4 )] . (47) 
Using Cayley relation, this is transformed into: 

J dU [*r(J7 _1 )tr(E/) 4 - Atr{lJ- l )Hr{Uf + 2tr(U^ 1 f + 4<r(C/" 1 )ir(t/)] . (48) 

The last term just gives 4 and the third term gives 2. One easily finds out 
that 3 3(8)3(8)3 contains 2 trivial representations. So the second term yields 
(—4) >2 = (-8). The first term requires the decomposition of 3(8>3<g)3<g>3<g)3 and 
one finds that it contributes a 3. Therefore the integral finally yields a value 
of 1. 
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